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ABSTRACT 

Science assessment has been included with mathematics 
and language assessment on the international level since the 1970s. 
This paper discusses techniques of assessment that have been utilized 
to measure science process skills. The first section discusses the 
Assessment Performance Unit, a British project with the aim of 
developing innovative methods in assessing science achievement, 
categories of science activities for assessment purposes were 
identified by the project: (1) use of graphical and symbolic 
representation; (2) use of apparatus and measuring instruments; 
observation; (4) interpretation and application; (5) planning of 
investigations; and (6) performance of investigations. The c eC ond 
section discusses the International Science Studies conducted bv the 
International Association for Evaluation of Educational Achievement 
UEA;. Assessment items used for the first and second studies are 
described. The remainder of the paper presents the assessment 
techniques that were piloted for use in the third study. The 
following categories of questions were used in the Third 
International Math and Science Study: (1) multiple choice items; (2) 
open-ended written items; (3) performance tasks, which produce a 
physical product beyond writing; and (4) performance tasks where the 
process of actually doing the task is documented and examined. Sample 
items for each of these categories are presented and discussed. 
(.Contains 16 references.) (MDH) 
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Introduction 



The purposes of educational evaluation/assessment generally have been manifold: 
Diagnosing, counselling, selecting, certifying, and evaluation of curricula, teaching, or 
educational systems (Johnson, 1987). Compared to such subjects as language and 
mathematics, science has a shorter history of widespread activity in the field of 
assessment. It is only since the 1970's that science has been included in national and 
international assessment programs. 

Early assessment in science concentrated on testing factual knowledge in the sciences. 
Questions were asked which typically had a "correct" answer. The growing knowledge of 
how children learn science led to an increasing change in the goals and objectives of 
teaching science. Science teaching has progressed towards an activity based, process 
orientated curriculum - a change that can not always be reflected in quantitative 
assessment instruments. The result is that innovation in science assessment has often fallen 
behind innovation in science teaching. 

It is imperative that good instruments be developed to assess students' understanding in 
science. The instruments will need to assess not only factual information but also the 
manner in which we go about doing and learning science. Science assessment is essential 
for providing information on misunderstandings and alternative conceptions, as well as the 
possible reasons for why such obstacles occur. 

Science assessment includes large national and international tests which are used for 
international comparisons. Science assessment also includes that which teachers do in their 
classrooms on a regular basis. Though inter-related in many ways, the main focus of this 
paper will be on the large international science assessment projects conducted by The 
International Association for Evaluation of Educational Achievement (LEA). 



The processes of Science 

In the beginning of the 197<>\s there was a growing interest in the processes of science in 
science teaching. Not only was it important to learn factual information in science but 
equally important was the way one went about learning science. The processes we refer to 
here include: observation, hypothesis testing, experimentation, classification and 
communication. Science curriculum materials were developed with an emphasis on 
processes including: Science A Process Approach (SAPA, 1967); Science Curriculum 
Improvement Study (SCIS, 1974); The Nuffield Project and Science 5/3 (1972). 

Science process skills generally divide between those that are cognitive in nature and those 
that relate to practical activities. Manipulative and observational skills, for example 
belong to the latter category, whilst recall and application of knowledge, the interpretation 
or information and problem-solving are examples of cognitive skills. It should be noted 
that the distinction between cognitive and practical skills is frequently only a matter of 
convince, for in many actual situations encountered in science education they come 
together. For example, being able to follow instructions accurately for conducting 
experiments, may be a skill that relates primarily to the execution of a practical task but 
also invokes a significant cognitive element (Kempa,1986). 

It seems natural that if process oriented objectives and activities are emphasized in 
curricula, they should also be focused on in the assessment methods. However science 
assessment tends to lag behind science teaching objectives. Testing and assessment 
methods arc in fact, often in conflict with the objectives expressed in the curriculum 
oon , 1990; Horst J° rd and Dalin - 1988; Johnson, 1987; Raaen, 1990; Swan 
1991). Until the end of the seventies, the main part of all tests in science asked for a mere 
reproduction of factual information, even though Bloom's taxonomy of cognitive 
objectives was often ustJ as the basis for the assessment and test specification (Bloom, 

The Assessment Performance Unit (APU) 

One of the first assessment projects that concentrated on the processes of science took 
place in England beginning in 1975. The Assessment Performance Unit (APU) project had 
the aim of developing innovative methods in assessing science achievement in both 
processes and content for pupils 11, 13 and 15 years old. 

The following APU framework for assessment reflects the underlying view of science 
adopted. Six categories of science activities were identified for assessment purposes The 
framework is common to all three age groups 



Use of graphical and symbolic representation 

- reading information from graphs, tables and charts 

- representing information as graphs, tables and charts 

Use of apparatus and measuring instruments 

- using measuring instruments 
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- estimating physical quantities 
'following instructions for practical work 

3 Observation 

- making and interpreting observations 

4 Interpretation and application 

" 1 interpreting of presented information 

- II applying: Biology concepts, Physics concepts, Chemistry concepts 

5 Planning of investigations 

- planning parts of investigations 

- planning entire investigations 

6 Performance of investigations 

- performing entire investigations 

The investigation surveys consist of written, individual practical and group practical tests 
all des Ig ned to assess how children "do" science by using the processes of science. 

The APU researchers point out that there are skills and processes in science that only can 

m d S e S o? <i PraCU ? 1 tCStS - In APU h W3S Sh0W " that valid and ^ assessment "an 
be made of the complex activities that practical tests imply. The drawback of such testing 

tha u is resource demanding, both with time and equipment. However, the develop fag 
v iZ h lng and categorizing science process and content skills may form a 

valuable bas.s for diagnostic questions and tests that teachers can use in their own Zi 
assessment practices. The assessment of processes together with content in science allows 
teachers to gam incites into pupils thinking and reasoning. 

The following (item 1) is an example of a paper and pencil test from category 4- 
Interpretation and application. It is also a typical example of how APU went abou> 
developing a series of ingenious questions and test items to assess processes Their 
particular feature is that many of these explore "everyday" situation's and do not therefore 
require the pupil to possess specialized scientific knowledge. 



Item 1 



Mr Brown had a garden full of daffodils, crocuses and snowdrops 
which came up year after year. p5. 









Daffodil 



T 



Crocus 



Snowdrop 



£[ I^M^ 0 ' 5 ^" Brown kept a recorcl of wh * n the plants where in 
flower. This is what they looked like. 



YEAR 
I 



YEAR 
2 



YEAR 

3 



EAf?U 
JAN. 



LATE 
JAN. 



EAKL7 
FEB. 



<0 



LATE 
FEB. 



EARLY 



MARCH AVflCH 



LATE 



EARLY 
APRIL 



LATE 
APRIL 



EARLY 
WAY 



record in year 3!) 



(Mr. Brown forgot to put snowdrops on the 

(a) What pattern do you notice in the chart about the times at which 
crocuses and daffodils flowered? C 

(b) When do you think the snowdrops were in flower during year 3? 



are given 30 m nutes fn7^v f ' y °P en -- ended P rac «cal tasks in which pupils 

appLus, to S ° me °w ° f 3 ° f 

trained administrator wTo then SoS Tnl » H ^T"^ " 3 standardized "ay by a 
technique, results JZS^ZT* ^ ^ ° f PUpUs ' CX P erimental 
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Item 2 



You have in front of you three kinds of paper towel X. Y and Z This is 
what you find out: 

Which kind of paper will hold most water? 

You can use any of the things in front of you. Choose whatever you 
need to answer the question. 

Make a clear record of your results, so that I can understand what 
you have found out. 




One quickly notices that the examples from APU tasks are very different from typical 
mulnpte.cho.ee questions or questions that assume a correct answer. Chi dren a f k d to 
use ^processes of sc.ence in their solutions and there may be multiple solutions" a 

The APU has influenced science assessment at every level. At the classroom level it has 
p ov.ded tools for acttv.ty based assessment of content and processes. At T eve of 
nauonal and mternauonal science assessment, APU has guided the way for in ov t i ve 
.ueas m science process assessment using quantitative instruments 



The International Association for Evaluation of Educational 
Achievement (IEA) 

The IEA was created in the beginning of 1960 as an international association of research 
centers Countries decide themselves whether or not they wish to partteta in he 
associate and today there are over 50 countries in the membership EA onducts 
international studies in different subject areas with the following aims 

MaZ'ies 12 mlal l T Ct aUermtiVe CUrriCUhr ' teachi ^ a » d administrative 
strategies have on student achievement within countries 

Provide current international information which countries can use to compare and 

2^2^ and studenl — wiihZ ^ 



i. 

2. 
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IEA has evaluated science achievement internationally s.ncc 1970. bceinning with the First 
Internat 10 nal Science Study (FISS) project. The Second International Science S mdv SIS? 

(TI l MSS) Un ifh akC, l m 1984 ' ^ Intemati ° nal N^themattcs anTscLnt u dy 

( tiM^b) will be administered in 1994. oiuuy 

The aim of the science achievement tests is to compare the intended and implemented 
umcuiun, t0 the tained (or what cmdnn P Rented 

the complexity of the educational environment. "lustrates 



Figure 1. Conceptual Framework for TIMSS 




IEA science tests are large sca,e psychometric tests which are dominated bv mnltinle 
The First International Science Study (FISS) was conducted in 1970 with 17 

Bloom and hts colleagues and was further refined for employment in thisaudT 
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IlTa/fJT" T! Stmn f C u ritlCiSmS °'' thC RSS tCSt ' FirSt - SOme itc ™ too "wordy" 
so that it was not clear whether reading skill was being measured or whether competence ' 

P arr k ;r, bei ' ig T sed - Secondly - the items th:u had ■»» -i**- «12KS 

paper tasks to a sess achievement with respect to practical work in science, did not 
measure ,n a valid way process skills in science, nor did they measure science competence 
as effectively as did the other content based items. These two shortcomings suggested t a 
greater use should be nude of diagrammatic material in the framing of quest onf and a 

^ h ?° U,d bC , ,m,re C,0Sdy rdated 10 the b ^ «> f comen, taughUn 

nhl PCn ° a,ld PUper Practical itCms shou,d not be incorporated as a specific 

subset ot the test items (Rosier and Keeves, 1992). spirit 

USA? fnlfn n H Vestigati0n ' six countries d^el, Japan, Korea, Singapore, Hungary and 
USA) mcluded a pract.cal test for ages 10 and 14. The. skills were categorized^ follows- 

fSSS 1 and reaSOmng H Prdiminary rCSUltS indicate H'ue co'eratfon 10 "- 
between the actual SISS science test and the experimental practical science test Tamir 
( 987) stresses the importance of international comparisons of this type which he pT 
couZes betWeen 1,16 imended and im P Iem -ted curnclm between ° 

The Third International Math and Science Study (TIMSS) 

For the first time in IEA traditions. Mathematics and Science will be combined into one 
n ernanonal project. Currently 70 countries have expressed an interest in partfeipaUng «n 
F /^SlL thJt'! 1 t hC ^r^ 10 ?' ^"^ 1993 " 94 *™ ^ A s com pared wit h 
^J^JST^ beCn Changed t0 indUde ^ * 13 the last 

TIMSS builds on the results of previous IEA science and math studies IEA studies w 
Fiinc. mus, oirvi:> ana bibb addressed many issues which were of nammnnni 

* international variations in the mathematics and science curricula ■ 
opportunity to learn; 

* attitudes and opinions of students teachers; 

students' achievement, with particular emphasis on capability of students to 
m apply thar knowledge and skill in non-routine applications ■ 

Me Nrofe o/ technology in teaching and learning of mathematics and science • 
^ncvoiian rw» /« co // ege courses ln mathcmatics ™ nCL > 

p C a1Zp a lon l ; aniCUl ^ ^ * ' mder4,aud fences in rates of 

practices employed by schools and school systems to direct students' coarse 
selection, including tracking and streaming : 

the nature role, and influence of officially prescribed textbooks on the 
teaching of science and mathematics : 

the comparative efficacy if different approaches to the teaching of 
mathematics and science on student outcomes. 

The measurement techniques to be used in the TIMSS project must be reflective of the 
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eZ n, m g °f ,S WUhin SCienCC :Uld ™ thc '™t«cs education, including students' 
easomng problem-solv.ng and communicating techniques. Traditional multiple" ho ^ 
tormat will thus be supplemented by alternative and innovative assessment technique; 

The current framework for assessment in the TIMSS project is as follows: 

/ Traditional multiple choice items 

2 Open-ended written items which require, both short answers and fanner 
essay type responses mn W, 



3 
4 



vriiing 



Performance tasks which produce a physical product beyond 

Performance tasks where the process of actually doing the task is 
documented and examined 



These alternative assessment techniques will be useful in testine nam of ,h P ; mn u 

zx°m\ P n re Tuo,T:: ries panicipa,, " g in ,he ™ ss ■**« - 

preparatton, co mm u„,ca,ion of da,a ,o ,hc test center, and Chen assoled "lems 

caten^T ^ "" ""= pre - p " 01 Ksl »»ich demonstrate .he 

calegones of question types used in the TIMSS fnmeumrt. r„ '""Miatc tne 

are taken from the test iiven to t i „ 7 ,ramework for assessment. Item examples 
i^dua,^ 

based on the pre-ptlot experience., of Norway, Sweden and Denmark ' 
Multiple - choice items 

incorrect. Item 3 is an example of such a question taken from the TIMSS Pre-pilot test. 
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Item .1 



TppVT " ° n '° ° fire ' ' f PUtS the f,re 0UI b V cutting off the 

oxygen 
nitrogen 
helium 

carbon dioxide 



A. 
B. 
C. 
D. 



also be taken i„,o account. By h,s w me a„ ha T.MSS ^ ff S ° pers P K:,i « : sh °»M 

«p'ore PuP i ls . thinking with , y n diff ~ ™ i^t t L un r r r r ssibili,y '° 

&rv^rr atfo " ,han ,he < La " r ers 

and mathematics. We therefore have Zln rt„ a M,sc ° nce Pn°ns" in science 

we can const™, good ' d h ractors- in nil , me "' ^ ° f rcfercnccs from which 
thinking. dtstracton, ,„ order to get tmportant information about student 

T^:™vZ7Z^ ?"* r'"^ *" "* — «" «"» from 

(EKNA. 1979-1989) * lm " U ° 8 S ' Uae " ,S in ,he arCiI of el <*«™'v 



Item 4 

Put a tick in the correct answer. 



o 
c 

0 

E 





,0 



Research on pupils" alternative conceptions in electneuv shows that a lot of students Ivive 
one-pole understanding .misconception, of the bulb and "one-pole" unde snS 

e~srac^v: eh r t:ery -, ln , ,tcm 4 ,hc , d ~ « ^ 

hmiv ,' traUOr , a< om> P° lc h a»tery and "one-pole" bulb: distractor b- "one-nolc" 
battery and two-pole" bub: distractor d: "two-pole" battcrv and "one-pole" b lb 
dtsiractore: "two-pole" battery and "one-pole" bulb. ' ' 

r^Sy^ 00 Clear ' y tUagn0StiC inf ° rmail0n ™ h0w are thinking 



Item 5 

each with a different type of soil. The same number o beans 1! 
The drawing shows the pots and the results after a few days. 




Loam 




Clay 




Sand 



Why was the experiment NOT a good one for the purpose? 



A. 
B. 
C. 
D. 



The size of the pots was not the same 
One pot should have been placed in the dark. 
The plants would get too hot on the window sill 
Different amounts of water should have been used 



nUsu^ order to answer this process 

content or process, but almost neve bih , , h ' Ch ° ke qUeSti ° nS test either 
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Open-ended written items 



Open-ended items differ from locked answer test items in that there is no one correct 
answer to the problem given. Students are given the opportunity to provide a variety of 
arguments which in turn may lead to variation in the solution to the problem. As 
mentioned earlier in the paper, APU has played a substantial role in the introduction of 
open-ended format questions in science: a subject that is tvpicallv assessed bv questions 
assuming a "correct" answer. 



Item 6 is an example of an open-ended question taken from the TIMSS pre-pilot 



test. 



Item 6 



You have a piece of string and you want to know how strong it is 
Write down what you think might be the best way to test the strenqth 
of your piece of string. 



The two following examples demonstrate typical student answers to Item 6; answer 1 
coming from the category "complete answer", answer 2 coming from the category 
'average answer". It is easy to see how this type of item allows us to understand how 
students are thinking when they are solving problems. In this: way both content and 
process information may be assessed in the same question. Problems with open-ended 
questions tire often related to administration and interpretation in that they are time 
consuming and difficult to objectively code. 



Answer 1 : 



Hi d; (M^ 

kit hi jU^W&gg 5- V^ZtoZ*^ 



/ would first fasten the string a little bit above the ground. Then I would fasten 
(the knot weakens the ^n ng up t0 50 %) a scale plate that flrst WQS weighed Then 
I would add w • MlU m0 re wight until it just held and then I would add up the 
wight. The weight is then the string's strength. 
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Answer 2: 



-&1jq — low btnm feu^fi^a^Q 

— fat 1 — urVbr , ag Qa2: 

fiLiG aj-tt&L 

firmed .iter SMLD flan&aiL 

VCJ<Jgr r-<«^ <-a f lW V> wa «. 

Ku.i-xAy/v <r 

0«e can tie the string to a weight and see when it breaks. Then you .see how much 
weight it held (and then you see) how strong the string is. 

In large scale IEA testing, open-ended testing has been tried on an experimental basis The 
UM6b test will try to incorperate this type of item, though not without difficulty The 
pre-pilot test information has clearly demonstrated that if such questions are to be 
included, extensive testing of the items must be done beforehand such that detailed 
information may be provided for how each item is to be coded. Included in this 
information should be examples of typical student answers from different catefiories of the 
coding. 



Performance tasks 

Performance tasks assume that students are given a practical problem to solve They are 
often characterized by the introduction of equipment as a pan of the problem solving 
activity. A written account of the process of solving the problem is most often required at 
the completion of the task. Many of the APU test items fall into this categorv as 
represented by Item 2; The Paper Towel Test. ' 

Item 7 is an example of a performance task used on the TIMSS pre-pilot test for all three 
populations, where students were asked to work with a partner. 

Before solving the performance item, the pupils were given the following directions: 

This item is actually a problem-solving activity involving both mathematics and 
science. As in any scientific investigation, you may have to make a plan, execute it, 
and then record your plan, actions, and results. 

You will be working with a partner for this activity, you may quietly discuss your 
thoughts and plans with your partner and may work together towards the solution 
of the problem. However, each of you must write your own answers in your own 
test booklet. You have 15 minutes to complete this activity 
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Item 7 



You are presently working at a aesk. table or some other piece of 
furniture. What size is iP 




The following examples are considered "average" student responses. 
Answer 3. 



~ a ^ ^ PT h i, ,f|,] 



^suredthe sides of the desk mth the test paper. We found out that the length 
was 70 cm and the mdth was 55 cm. The paper is 30 cm long in length 
Perimeter: 70+ 70+55+55=250cm 
Area: 70 55=3850cm 



Answer 4. 



1 



Ail. 



A. 



O Ml i^ , Oft , -fflltr dLu. ^ f ,tT f 



I'J !°J'i" k '" myS f ab0 '" lww much ,cm is ani " ,e " ' »' «P *"*» «■ *« desk 
7Jon 5tf OT =4200c m . Tte penmeur is 75cm+75cm+S6cm+56cm=262cm 



best ear-'Y available 

15 4 



Answer 5. 



U IpA it Sato i^c ? C l n 



<{/0+r tj&Ctaa 



31 g 



21 



1 i x 



IV« on eraser rtor was 2cm and a p«nei7 tfiar was /4cm. «/ € meowed the side 
of Hie desk as 46cm and 53 cm 



It « easy to see from these examples that there is no one "correct" answer to the problem 
Students develop a plan for solving the problem and then proceed to follow through onThe 
plan. After they have obta.ned results they are asked to write down the proceed J they 
have completed including the data. F'uceeaure tney 

When practical items are done correctly, assuming enough time and information, students 

nrlxl th UCtlV , y WitH thC Pr0CCSSeS ° f SdenCe While at the same time solving a 
problem u This type of item .s the best for representing the overall 2 oals of the science 
lesson. However, ,n large scale testing, this type of item is not without complicaTons 

The problems associated with coding for performance task items are the same as those 
mentioned for open-ended items. Items must be pre-tested and categories estabtis ed for 
coding before the actual test is given. csiaonsnea tor 

Performance items are not familiar ways of assessing students in science and mathematics 
therefore some time is needed to introduce the proceedure. In addition, when many ™ l 
groups of students preform the task at the same time, they often look around to see what 
cto are doing and then "steal" ideas from each other. This problem mav be al evi Id if 
simple equipment is given to each student rather than asking students to work in groups 
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extractors winch provide diagnostic evaluation. We have also seen the emergence of 
alternative assessment methods in science which include open-ended questions and 
practical tasks. These newer methods have many strengths in the information provided 
however they take time to administer and code, making them difficult to justify in large 
projects. * J b 

The following chart is a short summary of the assessment methods discussed in this paper 
their strengths and their weaknesses. V V ' 
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Rgm^S cience and Mathem atics Assessment Methods in TIMSS 



Assessment 
methods 



Multiple-choice 



Content/process 
Positive: 



Negative: 



Open-ended 



Performance 
task 



* large content 
coverage 

* different 
cognitive levels 
may be tested 

* diagnostic 
evaluation possible 



* complex answers 
possible: both 
process and 
content 

* rclcctivc, 
thinking process 
possible to show 

* depth in content 
coverage 



* process skills 
easily tested 

* group assessment 
possible 

* complex answers 
possible: both 
process and 
content 

* reflective, 
thinking process 
possible to show 

* depth in content 
coverage 



* cither content 
or process 
tested, rarely 
both 

* reflective, 
thinking process 
absent 



Objectivity/adm 
Strengths: 



* limited content 
coverage 



* objective, high 
reliability 

* easy coding 
and 

administration 

* easy to assess 
large 

populations 



"listrution etc 
Weaknesses: 



* difficult to construct 
good distractors 



* high validity 
possible 

* items easy to 
construct 



* limited content 
coverage 



items easy to 
construct 
* high validity 
possible 



* lime consuming to 
code/mark 

* objectivity difficult 
to achieve in large 
populations 

* reliability difficult to 
achieve in large 
populations 

* difficult to assess 
large populations 



requires equipment 

* time consuming to 
administer 

* difficult to assess 
large populations 

* objectivity difficult 
to achieve in large 
populations 

* reliability difficult to 
achieve in large 
populations 



this type of test item be inched In 7 g , classrooms - W ^ would encourage that 

relevance to classroom evaluation ■ '"formation will have direct 

diagnostic c^S^Ti^^? - valuation methods which encourages 
teacher. " imP ° rtant tOGl for the dassroom rnathematics and science 
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